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ABSTRACT 

A collective  of  agents  often  needs  to  maximize  a “world 
utility”  function  which  rates  the  performance  of  an  entire 
system,  while  subject  to  communication  restrictions  among 
the  agents.  Such  communication  restrictions  make  it  dif- 
ficult for  agents  which  try  to  pursue  their  own  “private” 
utilities  to  take  actions  that  also  help  optimize  the  world 
utility.  Team  formation  presents  a solution  to  this  problem, 
where  by  joining  other  agents,  an  agent  can  significantly 
increase  its  knowledge  about  the  environment  and  improve 
its  chances  of  both  optimizing  its  own  utility  and  that  its 
doing  so  will  contribute  to  the  world  utility.  In  this  arti- 
cle we  show  how  utilities  that  have  been  previously  shown 
to  be  effective  in  collectives  can  be  modified  to  be  more 
effective  in  domains  with  moderate  communication  restric- 
tions resulting  in  performance  improvements  of  up  to  75%. 
Additionally  we  show  that  even  severe  communication  con- 
straints can  be  overcome  by  forming  teams  where  each  agent 
of  a team  shares  the  same  utility,  increasing  performance  an 
additional  25%.  We  show  that  utilities  and  team  sizes  can 
be  manipulated  to  form  the  best  compromise  between  how 
“aligned”  an  agent’s  utility  is  with  the  world  utility  and  how 
easily  an  agent  can  learn  that  utility. 

Categories  and  Subject  Descriptors 

1.2.11  [Artificial  Intelligence]:  Multiagent  Systems 
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1.  INTRODUCTION 

Many  methods  exist  for  coordinating  the  actions  of  a mul- 
tiagent  system  when  the  agents  can  fully  communicate  with 
one  another  [3,  4].  However,  many  problems  impose  com- 
munication restrictions  among  the  agents,  rendering  the  co- 
ordination problem  more  difficult  [1],  Examples  of  these 
problems,  include  controlling  collections  of  rovers,  constel- 
lations of  satellites  and  packet  routers,  where  an  agent  may 
only  be  able  to  directly  communicate  with  a small  number 
of  other  agents.  In  all  of  these  problems,  the  collective’s 
designer  faces  the  following  difficult  task: 

• ensuring  that,  as  far  as  the  provided  “world  utility 
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function”  is  concerned,  the  agents  do  not  work  at  cross- 
purposes (i.e.,  making  sure  that  the  private  utilities  of 
the  agents  and  the  world  utility  are  “aligned” ) . 

• ensuring  that  agents  can  achieve  their  private  utilities 
when  they  do  not  have  access  to  a broad  communica- 
tion network  giving  them  access  to  global  information. 

These  tasks  can  be  addressed  with  the  theory  of  collectives 
which  has  been  successfully  applied  to  multiple  domains  in- 
cluding packet  routing  over  a data  network,  the  congestion 
game  known  as  Arthur’s  El  Farol  Bar  problem  [4],  and  the 
coordination  of  multi-rovers  in  learning  sequences  of  actions. 

The  thoery  of  collectives  is  concerned  with  the  world 
utility  G(z),  which  is  a function  of  the  full  worldline,  z. 
The  problem  at  hand  is  to  find  the  z that  maximizes  G(z). 
In  addition  to  67,  for  each  agent  77,  there  is  a private  util- 
ity function  gn.  The  agents  act  to  improve  their  individual 
private  functions,  even  though,  we,  as  system  designers  are 
only  concerned  with  the  value  of  the  world  utility  G.  An 
important  property  we  want  a private  utility  to  have  is  fac- 
toredness  with  respect  to  G,  intuitively  meaning  that  an 
action  taken  by  an  agent  that  improves  its  private  utility 
also  improves  the  world  utility.  In  addition  to  being  fac- 
tored we  want  the  agents’  private  utility  functions  to  have 
high  learnability,  intuitively  meaning  that  an  agent’s  util- 
ity should  be  sensitive  to  its  own  actions  and  insensitive  to 
actions  of  others.  As  a trivial  example,  any  “team  game”  in 
which  all  the  private  functions  equal  G is  factored,  but  has 
low  learnability  since  all  the  agents’  actions  have  a signifi- 
cant effect  on  the  value  of  G. 

Consider  difference  utilities,  which  are  of  the  form: 

DUr)  = G(z)  — G(CLv(z))  (1) 

where  CLq(z)  = (z^,4 7)  is  a pre-fixed  clamping  param- 
eter iq  chosen  from  among  77’s  legal  or  illegal  moves.  Such 
difference  utilities  are  factored  no  matter  what  the  choice  of 
clamping  parameter  because  the  second  term  does  not  de- 
pend on  77’s  state  [4].  Furthermore,  they  usually  have  far 
better  learnability  than  does  a team  game  because  the  sec- 
ond term  of  DU  which  removes  a lot  of  the  effect  of  other 
agents  (i.e.,  noise)  from  77’s  utility. 


Mathematically  we  will  represent  the  communication  re- 
strictions as  elements  of  the  worldline  that  are  not  observ- 
able. Given  a worldline  z,  we  can  decompose  it  into  an 
observable  components,  z°,  and  hidden  components,  zh  (we 
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will  denote  the  concatenated  state  z by  z — (z°,  zh)).  If  the 
DU  depends  on  any  component  of  zh  then  we  cannot  com- 
pute it  directly.  Instead  there  are  several  approximations  to 
the  DU  that  vary  in  their  balance  between  learnability  and 
factoredness.  In  this  paper  we  propose  4 approximations  x: 


BTUv(z 0 

= G(z)-G(CLv(z°,0)) 

(2) 

TTUv{z) 

= G((z° , 0))  — G(CLn(z0, 0)) 

(3) 

BEUv(z) 

= G(z)-G(CLv(z°,E{zh\z°})) 

(4) 

EEUv(z) 
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G(CLv(z°,E[zh  |*°])) 

(5) 

where  0 is  the  vector  whose  components  are  all  zero,  CLV 
clamps  all  components  of  agent  rj  to  the  zero  vector,  and  E\\ 
is  the  expectation  operator.  Note  that  the  BTU  and  BEU 
assume  that  the  true  world  utility  can  be  produced  despite 
the  communication  restriction.  These  two  utilities  are  also 
factored  since  they  are  in  the  form  of  equation  1,  however 
they  may  not  be  very  learnable  since  the  second  term  uses 
different  information  from  the  first,  causing  less  noise  to  be 
subtracted  out.  EEU  does  not  have  this  problems,  and  with 
a good  estimate  of  zh  it  may  still  be  close  to  being  factored. 

As  discussed  above,  communication  restrictions  can  have 
serious  negative  effects  on  the  utility  functions  of  the  agents. 
One  way  to  remedy  this  situation  is  to  let  agents  form  “teams” 
which  “share”  information  [2].  In  this  paper  a team  is  de- 
fined as  an  aggregation  of  agents  where  each  agent:  (1)  be- 
longs to  one  and  only  one  team,  (2)  receives  the  utility  of  the 
team,  and  (3)  shares  information  with  its  team  members. 

2.  EXPERIMENTAL  RESULTS 

We  conducted  a series  of  experiments  on  a generalized  ver- 
sion of  the  El  Farol  Bar  Problem  described  in  [4].  The  first 
set  of  experiments  were  conducted  without  teams  (team  size 
= 1).  Figure  1 shows  the  performance  of  the  four  utilities 
with  different  levels  of  communication.  With  high  commu- 
nication levels,  all  the  utilities  converge  to  the  DU.  When 
communication  is  very  low,  the  BTU  and  BEU  have  the 
best  performance  because  their  first  term  G is  not  affected 
by  the  communication  restriction,  and  converge  to  G when 
communication  is  zero.  However  these  utilities  have  trouble 
incorporating  additional  knowledge  and  cannot  do  better 
than  G when  performance  below  the  50%  communication 
level.  At  most  communication  levels,  the  EEU  performs 
the  best,  since  it  is  the  most  learnable  and  is  very  close  to 
being  factored.  Even  though  it  is  fairly  learnable,  TTU  per- 
forms the  worst  at  most  communication  levels  since  it  is  not 
close  to  being  factored. 

Even  using  the  best  utility,  EEU , a high  level  of  per- 
formance cannot  be  achieved  if  the  communication  level  is 
too  low.  However  if  agents  can  form  small  teams  where  in- 
formation sharing  is  allowed  between  team  members,  good 
performance  is  possible  even  when  communication  between 
teams  is  low.  While  team  information  sharing  can  be  seen 
simply  as  increasing  the  communication  level,  we  assume  it 
is  added,  under  the  new  constrains  of  team  formation,  on  top 

xThe  first  two  letters  of  the  utility  represent  how  the  two 
terms  of  the  difference  utility  get  their  information.  “B” 
stands  for  “broadcast”,  “T”  stands  for  “truncated”  since 
the  hidden  values  are  just  thrown  away,  and  “E”  stands  for 
“estimated.” 


Figure  1:  Performance  of  four  utility  functions  with- 
out teams  for  a range  of  communication  levels.  For 
moderate  communication  levels  EEU  performs  best. 
For  very  low  communication  BTU  performs  best 
since,  it  uses  information  from  world  utility. 


Figure  2:  Performance  of  four  utility  functions  at 
10%  communication,  using  teams.  EEU  performs 
best  for  most  team  sizes  under  normal  learning  time. 

of  a different  communication  system  with  a fixed  communi- 
cation level.  Figure  2 shows  the  tradeoffs  between  choices 
of  team  size  at  a low  level  of  communication.  At  most  com- 
munication levels,  there  is  an  optimal  team  size  that  lies 
between  the  extremes  of  not  having  teams  (team  size  = 1), 
and  only  having  a single  team  (team  size  = 100).  As  the 
sizes  of  the  teams  grow,  there  is  more  information  sharing, 
but  there  is  also  more  noise  in  each  agent’s  utility,  since  their 
utility  will  be  influenced  by  the  actions  of  more  agents.  In 
our  problem,  the  best  team  size  is  typically  around  5 or  10 
agents.  This  optimum  represents  to  best  balance  between 
having  small  team  sizes  which  produce  a more  learnable  util- 
ity and  large  team  sizes  which  allows  for  more  information 
sharing. 
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